Scrapy：嵌套的请求怎么写，demo不知道问题出在哪

#Scrapy：嵌套的请求怎么写，demo不知道问题出在哪| 来源: 网络整理| 查看: 265

时间：2017-06-07 00:00:00 阅读：评论：0 作者：

问题： Scrapy:嵌套的请求怎么写，demo不知道问题出在哪描述:

我用的是Python 3.5.2和Scrapy 1.1 . 在下面的demo中，有一个嵌套的ajax请求，具体是这样的：在文章内容页，有一个显示文章作者的ajax请求，在登录时，会显示作者，不登陆就不会显示，在爬取文章内容页的时候，同时发起这个ajax请求，同时解析返回的ajax页面，不知道下面这样写错在什么地方。

demo:

# -*- coding: utf-8 -*- import scrapy from demo.items import ExampleItem from scrapy.spiders import CrawlSpider import re class ExampleSpider(CrawlSpider): name = "example" allowed_domains = ["example.com"] start_urls = [ "http://www.example.com/articles-list.php?page=1", "http://www.example.com/articles-list.php?page=2", "http://www.example.com/articles-list.php?page=3", "http://www.example.com/articles-list.php?page=4", "http://www.example.com/articles-list.php?page=5", "http://www.example.com/articles-list.php?page=6", ] headers = { 'Accept':'*/*', 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', 'Connection':'keep-alive', 'Cookie':'PHPSESSID=12345370000029b72333333dc999999; QS[uid]=100; QS[username]=example; QS[password]=example.com; QS[pmscount]=1', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2774.3 Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } def parse(self, response): hrefs = response.xpath('a/@href') for href in hrefs: url = response.urljoin(href.extract()) yield scrapy.Request(url, callback=self.parse_article_contents) def parse_article_contents(self, response): for sel in response.xpath('/html/body'): item = ExampleItem() item['articleUrl'] = response.url item['title'] = sel.xpath('div[3]/a[2]/@href')[0].extract() item['content'] = sel.xpath('div[2]/div[2]/div[1]/text()')[0].extract() #这文章内容页，有一个显示文章作者的ajax请求，下面构造这个请求： articleId = re.search(u'id=(\d{1,4})&', item['articleUrl']).group(1) articleAuthorUrl = 'http://www.example.com/plus/ajax_author.php?id=' + articleId #爬取作者，下面这样写对吗？ def request_article_author(self): return scrapy.Request(url=articleAuthorUrl,headers=headers,callback=self.parse_article_author) def parse_article_author(self, response): item['author'] = response.xpath('/html/body/div/div[1]/div[2]/text()').extract() # 下面这个"yield item"能够yield上面嵌套的那个item['author']吗? yield item 解决方案1:

网址不详，无法重现错误，没法回答。

原文链接：https://www.weikejianghu.com/program/faq/20176/143687.html

上一篇： pycharm打开IDLE编写的程序，中文注释显示乱码的问题下一篇： gogogo世界杯Go语言会取代python吗？标签：scrapy demo python scrapy scrapy爬虫 scrapy scrapy安装

【本文地址】

公司简介

联系我们